Skip to content

[NVBug 5996622] skip syncing for Llama4 experts#1136

Merged
shengliangxu merged 4 commits into
mainfrom
shengliangx/nvbug-5996622
Mar 30, 2026
Merged

[NVBug 5996622] skip syncing for Llama4 experts#1136
shengliangxu merged 4 commits into
mainfrom
shengliangx/nvbug-5996622

Conversation

@shengliangxu
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu commented Mar 30, 2026

What does this PR do?

Type of change: But fix

Llama4 experts are already fused

Testing

Rerun NVBug 5996622 QA test.

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability of quantization synchronization for expert modules with shared quantizers, preventing unnecessary synchronization attempts.
  • Documentation

    • Updated documentation to clarify quantization behavior for fused expert module configurations.

Llama4 experts are already fused

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu requested a review from a team as a code owner March 30, 2026 06:38
@shengliangxu shengliangxu requested a review from mxinO March 30, 2026 06:38
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7a90207-8181-4774-a592-a47cb024c175

📥 Commits

Reviewing files that changed from the base of the PR and between a3f5c46 and 6463b5e.

📒 Files selected for processing (1)
  • modelopt/torch/quantization/plugins/huggingface.py

📝 Walkthrough

Walkthrough

Updated _QuantSparseMoe.layer_sync_moe_local_experts_amax method to add an additional guard that prevents amax synchronization for non-iterable (fused) experts modules. The method now attempts iteration and returns early on TypeError, ensuring sync_moe_expert_amax is only invoked for iterable experts.

Changes

Cohort / File(s) Summary
Fused Experts Guard
modelopt/torch/quantization/plugins/huggingface.py
Added control flow guard using iter(self.experts) to skip sync_moe_expert_amax invocation for non-iterable (fused) experts modules. Expanded docstring to document synchronization behavior for fused experts with shared quantizers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[NVBug 5996622] skip syncing for Llama4 experts' directly reflects the main change: skipping amax synchronization for Llama4 (fused) experts modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The PR introduces a minor control flow modification adding a try-except block to safely check if self.experts is iterable, addressing fused experts modules without introducing security anti-patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/nvbug-5996622

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-03-30 22:11 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.21%. Comparing base (f04e106) to head (189f236).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1136      +/-   ##
==========================================
+ Coverage   70.19%   70.21%   +0.02%     
==========================================
  Files         230      230              
  Lines       26073    26073              
==========================================
+ Hits        18302    18308       +6     
+ Misses       7771     7765       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shengliangxu shengliangxu requested a review from cjluo-nv March 30, 2026 16:46
@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Mar 30, 2026
@shengliangxu shengliangxu requested a review from meenchen March 30, 2026 17:16
Copy link
Copy Markdown
Contributor

@meenchen meenchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regression? wondering how we didn't find this issue in the previous releases.

@shengliangxu
Copy link
Copy Markdown
Collaborator Author

Is this a regression? wondering how we didn't find this issue in the previous releases.

It's some new code logic for syncing local experts. That's why we need Modeling Lib for per-model logics.

@shengliangxu shengliangxu merged commit 21aa8a9 into main Mar 30, 2026
45 checks passed
@shengliangxu shengliangxu deleted the shengliangx/nvbug-5996622 branch March 30, 2026 22:10
kevalmorabia97 pushed a commit that referenced this pull request Apr 2, 2026
### What does this PR do?

Type of change: But fix

Llama4 experts are already fused


### Testing

Rerun NVBug 5996622 QA test.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved reliability of quantization synchronization for expert
modules with shared quantizers, preventing unnecessary synchronization
attempts.
  
* **Documentation**
* Updated documentation to clarify quantization behavior for fused
expert module configurations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@kevalmorabia97 kevalmorabia97 removed the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants